The search functionality is under construction.

Keyword Search Result

[Keyword] deep learning(149hit)

61-80hit(149hit)

  • Gradient Corrected Approximation for Binary Neural Networks

    Song CHENG  Zixuan LI  Yongsen WANG  Wanbing ZOU  Yumei ZHOU  Delong SHANG  Shushan QIAO  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/07/05
      Vol:
    E104-D No:10
      Page(s):
    1784-1788

    Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.

  • Health Indicator Estimation by Video-Based Gait Analysis

    Ruochen LIAO  Kousuke MORIWAKI  Yasushi MAKIHARA  Daigo MURAMATSU  Noriko TAKEMURA  Yasushi YAGI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/09
      Vol:
    E104-D No:10
      Page(s):
    1678-1690

    In this study, we propose a method to estimate body composition-related health indicators (e.g., ratio of body fat, body water, and muscle, etc.) using video-based gait analysis. This method is more efficient than individual measurement using a conventional body composition meter. Specifically, we designed a deep-learning framework with a convolutional neural network (CNN), where the input is a gait energy image (GEI) and the output consists of the health indicators. Although a vast amount of training data is typically required to train network parameters, it is unfeasible to collect sufficient ground-truth data, i.e., pairs consisting of the gait video and the health indicators measured using a body composition meter for each subject. We therefore use a two-step approach to exploit an auxiliary gait dataset that contains a large number of subjects but lacks the ground-truth health indicators. At the first step, we pre-train a backbone network using the auxiliary dataset to output gait primitives such as arm swing, stride, the degree of stoop, and the body width — considered to be relevant to the health indicators. At the second step, we add some layers to the backbone network and fine-tune the entire network to output the health indicators even with a limited number of ground-truth data points of the health indicators. Experimental results show that the proposed method outperforms the other methods when training from scratch as well as when using an auto-encoder-based pre-training and fine-tuning approach; it achieves relatively high estimation accuracy for the body composition-related health indicators except for body fat-relevant ones.

  • Unsupervised Building Damage Identification Using Post-Event Optical Imagery and Variational Autoencoder

    Daming LIN  Jie WANG  Yundong LI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/20
      Vol:
    E104-D No:10
      Page(s):
    1770-1774

    Rapid building damage identification plays a vital role in rescue operations when disasters strike, especially when rescue resources are limited. In the past years, supervised machine learning has made considerable progress in building damage identification. However, the usage of supervised machine learning remains challenging due to the following facts: 1) the massive samples from the current damage imagery are difficult to be labeled and thus cannot satisfy the training requirement of deep learning, and 2) the similarity between partially damaged and undamaged buildings is high, hindering accurate classification. Leveraging the abundant samples of auxiliary domains, domain adaptation aims to transfer a classifier trained by historical damage imagery to the current task. However, traditional domain adaptation approaches do not fully consider the category-specific information during feature adaptation, which might cause negative transfer. To address this issue, we propose a novel domain adaptation framework that individually aligns each category of the target domain to that of the source domain. Our method combines the variational autoencoder (VAE) and the Gaussian mixture model (GMM). First, the GMM is established to characterize the distribution of the source domain. Then, the VAE is constructed to extract the feature of the target domain. Finally, the Kullback-Leibler (KL) divergence is minimized to force the feature of the target domain to observe the GMM of the source domain. Two damage detection tasks using post-earthquake and post-hurricane imageries are utilized to verify the effectiveness of our method. Experiments show that the proposed method obtains improvements of 4.4% and 9.5%, respectively, compared with the conventional method.

  • Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets

    Yung-Hui LI  Muhammad Saqlain ASLAM  Latifa Nabila HARFIYA  Ching-Chun CHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/06/01
      Vol:
    E104-D No:9
      Page(s):
    1450-1458

    The recent development of deep learning-based generative models has sharply intensified the interest in data synthesis and its applications. Data synthesis takes on an added importance especially for some pattern recognition tasks in which some classes of data are rare and difficult to collect. In an iris dataset, for instance, the minority class samples include images of eyes with glasses, oversized or undersized pupils, misaligned iris locations, and iris occluded or contaminated by eyelids, eyelashes, or lighting reflections. Such class-imbalanced datasets often result in biased classification performance. Generative adversarial networks (GANs) are one of the most promising frameworks that learn to generate synthetic data through a two-player minimax game between a generator and a discriminator. In this paper, we utilized the state-of-the-art conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for generating the minority class of iris images which saves huge amount of cost of human labors for rare data collection. With our model, the researcher can generate as many iris images of rare cases as they want and it helps to develop any deep learning algorithm whenever large size of dataset is needed.

  • Video Inpainting by Frame Alignment with Deformable Convolution

    Yusuke HARA  Xueting WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1349-1358

    Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

  • CJAM: Convolutional Neural Network Joint Attention Mechanism in Gait Recognition

    Pengtao JIA  Qi ZHAO  Boze LI  Jing ZHANG  

     
    PAPER

      Pubricized:
    2021/04/28
      Vol:
    E104-D No:8
      Page(s):
    1239-1249

    Gait recognition distinguishes one individual from others according to the natural patterns of human gaits. Gait recognition is a challenging signal processing technology for biometric identification due to the ambiguity of contours and the complex feature extraction procedure. In this work, we proposed a new model - the convolutional neural network (CNN) joint attention mechanism (CJAM) - to classify the gait sequences and conduct person identification using the CASIA-A and CASIA-B gait datasets. The CNN model has the ability to extract gait features, and the attention mechanism continuously focuses on the most discriminative area to achieve person identification. We present a comprehensive transformation from gait image preprocessing to final identification. The results from 12 experiments show that the new attention model leads to a lower error rate than others. The CJAM model improved the 3D-CNN, CNN-LSTM (long short-term memory), and the simple CNN by 8.44%, 2.94% and 1.45%, respectively.

  • Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

    Thao-Nguyen TRUONG  Ryousei TAKANO  

     
    PAPER-Information Network

      Pubricized:
    2021/04/23
      Vol:
    E104-D No:8
      Page(s):
    1332-1339

    Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.

  • Capsule Network with Shortcut Routing Open Access

    Thanh Vu DANG  Hoang Trong VO  Gwang Hyun YU  Jin Young KIM  

     
    PAPER-Image

      Pubricized:
    2021/01/27
      Vol:
    E104-A No:8
      Page(s):
    1043-1050

    Capsules are fundamental informative units that are introduced into capsule networks to manipulate the hierarchical presentation of patterns. The part-hole relationship of an entity is learned through capsule layers, using a routing-by-agreement mechanism that is approximated by a voting procedure. Nevertheless, existing routing methods are computationally inefficient. We address this issue by proposing a novel routing mechanism, namely “shortcut routing”, that directly learns to activate global capsules from local capsules. In our method, the number of operations in the routing procedure is reduced by omitting the capsules in intermediate layers, resulting in lighter routing. To further address the computational problem, we investigate an attention-based approach, and propose fuzzy coefficients, which have been found to be efficient than mixture coefficients from EM routing. Our method achieves on-par classification results on the Mnist (99.52%), smallnorb (93.91%), and affNist (89.02%) datasets. Compared to EM routing, our fuzzy-based and attention-based routing methods attain reductions of 1.42 and 2.5 in terms of the number of calculations.

  • An Efficient Deep Learning Based Coarse-to-Fine Cephalometric Landmark Detection Method

    Yu SONG  Xu QIAO  Yutaro IWAMOTO  Yen-Wei CHEN  Yili CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/05/14
      Vol:
    E104-D No:8
      Page(s):
    1359-1366

    Accurate and automatic quantitative cephalometry analysis is of great importance in orthodontics. The fundamental step for cephalometry analysis is to annotate anatomic-interested landmarks on X-ray images. Computer-aided automatic method remains to be an open topic nowadays. In this paper, we propose an efficient deep learning-based coarse-to-fine approach to realize accurate landmark detection. In the coarse detection step, we train a deep learning-based deformable transformation model by using training samples. We register test images to the reference image (one training image) using the trained model to predict coarse landmarks' locations on test images. Thus, regions of interest (ROIs) which include landmarks can be located. In the fine detection step, we utilize trained deep convolutional neural networks (CNNs), to detect landmarks in ROI patches. For each landmark, there is one corresponding neural network, which directly does regression to the landmark's coordinates. The fine step can be considered as a refinement or fine-tuning step based on the coarse detection step. We validated the proposed method on public dataset from 2015 International Symposium on Biomedical Imaging (ISBI) grand challenge. Compared with the state-of-the-art method, we not only achieved the comparable detection accuracy (the mean radial error is about 1.0-1.6mm), but also largely shortened the computation time (4 seconds per image).

  • Multi-View Texture Learning for Face Super-Resolution

    Yu WANG  Tao LU  Feng YAO  Yuntao WU  Yanduo ZHANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/03/24
      Vol:
    E104-D No:7
      Page(s):
    1028-1038

    In recent years, single face image super-resolution (SR) using deep neural networks have been well developed. However, most of the face images captured by the camera in a real scene are from different views of the same person, and the existing traditional multi-frame image SR requires alignment between images. Due to multi-view face images contain texture information from different views, which can be used as effective prior information, how to use this prior information from multi-views to reconstruct frontal face images is challenging. In order to effectively solve the above problems, we propose a novel face SR network based on multi-view face images, which focus on obtaining more texture information from multi-view face images to help the reconstruction of frontal face images. And in this network, we also propose a texture attention mechanism to transfer high-precision texture compensation information to the frontal face image to obtain better visual effects. We conduct subjective and objective evaluations, and the experimental results show the great potential of using multi-view face images SR. The comparison with other state-of-the-art deep learning SR methods proves that the proposed method has excellent performance.

  • Secret Key Generation Scheme Based on Deep Learning in FDD MIMO Systems

    Zheng WAN  Kaizhi HUANG  Lu CHEN  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/04/07
      Vol:
    E104-D No:7
      Page(s):
    1058-1062

    In this paper, a deep learning-based secret key generation scheme is proposed for FDD multiple-input and multiple-output (MIMO) systems. We built an encoder-decoder based convolutional neural network to characterize the wireless environment to learn the mapping relationship between the uplink and downlink channel. The designed neural network can accurately predict the downlink channel state information based on the estimated uplink channel state information without any information feedback. Random secret keys can be generated from downlink channel responses predicted by the neural network. Simulation results show that deep learning based SKG scheme can achieve significant performance improvement in terms of the key agreement ratio and achievable secret key rate.

  • Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation

    Yong HE  Ji LI  Xuanhong ZHOU  Zewei CHEN  Xin LIU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/03/26
      Vol:
    E104-D No:7
      Page(s):
    1039-1048

    6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.

  • Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

    Koichi SHIRAHATA  Amir HADERBACHE  Naoto FUKUMOTO  Kohta NAKASHIMA  

     
    BRIEF PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    257-260

    Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

  • Differentially Private Neural Networks with Bounded Activation Function

    Kijung JUNG  Hyukki LEE  Yon Dohn CHUNG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/03/18
      Vol:
    E104-D No:6
      Page(s):
    905-908

    Deep learning has shown outstanding performance in various fields, and it is increasingly deployed in privacy-critical domains. If sensitive data in the deep learning model are exposed, it can cause serious privacy threats. To protect individual privacy, we propose a novel activation function and stochastic gradient descent for applying differential privacy to deep learning. Through experiments, we show that the proposed method can effectively protect the privacy and the performance of proposed method is better than the previous approaches.

  • HAIF: A Hierarchical Attention-Based Model of Filtering Invalid Webpage

    Chaoran ZHOU  Jianping ZHAO  Tai MA  Xin ZHOU  

     
    PAPER

      Pubricized:
    2021/02/25
      Vol:
    E104-D No:5
      Page(s):
    659-668

    In Internet applications, when users search for information, the search engines invariably return some invalid webpages that do not contain valid information. These invalid webpages interfere with the users' access to useful information, affect the efficiency of users' information query and occupy Internet resources. Accurate and fast filtering of invalid webpages can purify the Internet environment and provide convenience for netizens. This paper proposes an invalid webpage filtering model (HAIF) based on deep learning and hierarchical attention mechanism. HAIF improves the semantic and sequence information representation of webpage text by concatenating lexical-level embeddings and paragraph-level embeddings. HAIF introduces hierarchical attention mechanism to optimize the extraction of text sequence features and webpage tag features. Among them, the local-level attention layer optimizes the local information in the plain text. By concatenating the input embeddings and the feature matrix after local-level attention calculation, it enriches the representation of information. The tag-level attention layer introduces webpage structural feature information on the attention calculation of different HTML tags, so that HAIF is better applicable to the Internet resource field. In order to evaluate the effectiveness of HAIF in filtering invalid pages, we conducted various experiments. Experimental results demonstrate that, compared with other baseline models, HAIF has improved to various degrees on various evaluation criteria.

  • MTGAN: Extending Test Case set for Deep Learning Image Classifier

    Erhu LIU  Song HUANG  Cheng ZONG  Changyou ZHENG  Yongming YAO  Jing ZHU  Shiqi TANG  Yanqiu WANG  

     
    PAPER-Software Engineering

      Pubricized:
    2021/02/05
      Vol:
    E104-D No:5
      Page(s):
    709-722

    During the recent several years, deep learning has achieved excellent results in image recognition, voice processing, and other research areas, which has set off a new upsurge of research and application. Internal defects and external malicious attacks may threaten the safe and reliable operation of a deep learning system and even cause unbearable consequences. The technology of testing deep learning systems is still in its infancy. Traditional software testing technology is not applicable to test deep learning systems. In addition, the characteristics of deep learning such as complex application scenarios, the high dimensionality of input data, and poor interpretability of operation logic bring new challenges to the testing work. This paper focuses on the problem of test case generation and points out that adversarial examples can be used as test cases. Then the paper proposes MTGAN which is a framework to generate test cases for deep learning image classifiers based on Generative Adversarial Network. Finally, this paper evaluates the effectiveness of MTGAN.

  • Action Recognition Using Pose Data in a Distributed Environment over the Edge and Cloud

    Chikako TAKASAKI  Atsuko TAKEFUSA  Hidemoto NAKADA  Masato OGUCHI  

     
    PAPER

      Pubricized:
    2021/02/02
      Vol:
    E104-D No:5
      Page(s):
    539-550

    With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.

  • Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification

    Yih-Cherng LEE  Hung-Wei HSU  Jian-Jiun DING  Wen HOU  Lien-Shiang CHOU  Ronald Y. CHANG  

     
    PAPER-Image

      Pubricized:
    2020/09/29
      Vol:
    E104-A No:4
      Page(s):
    734-743

    Automatic tracking and classification are essential for studying the behaviors of wild animals. Owing to dynamic far-shooting photos, the occlusion problem, protective coloration, the background noise is irregular interference for designing a computerized algorithm for reducing human labeling resources. Moreover, wild dolphin images are hard-acquired by on-the-spot investigations, which takes a lot of waiting time and hardly sets the fixed camera to automatic monitoring dolphins on the ocean in several days. It is challenging tasks to detect well and classify a dolphin from polluted photos by a single famous deep learning method in a small dataset. Therefore, in this study, we propose a generic Cascade Small Object Detection (CSOD) algorithm for dolphin detection to handle small object problems and develop visualization to backbone based classification (V2BC) for removing noise, highlighting features of dolphin and classifying the name of dolphin. The architecture of CSOD consists of the P-net and the F-net. The P-net uses the crude Yolov3 detector to be a core network to predict all the regions of interest (ROIs) at lower resolution images. Then, the F-net, which is more robust, is applied to capture the ROIs from high-resolution photos to solve single detector problems. Moreover, a visualization to backbone based classification (V2BC) method focuses on extracting significant regions of occluded dolphin and design significant post-processing by referencing the backbone of dolphins to facilitate for classification. Compared to the state of the art methods, including faster-rcnn, yolov3 detection and Alexnet, the Vgg, and the Resnet classification. All experiments show that the proposed algorithm based on CSOD and V2BC has an excellent performance in dolphin detection and classification. Consequently, compared to the related works of classification, the accuracy of the proposed designation is over 14% higher. Moreover, our proposed CSOD detection system has 42% higher performance than that of the original Yolov3 architecture.

  • Deep Network for Parametric Bilinear Generalized Approximate Message Passing and Its Application in Compressive Sensing under Matrix Uncertainty

    Jingjing SI  Wenwen SUN  Chuang LI  Yinbo CHENG  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2020/09/29
      Vol:
    E104-A No:4
      Page(s):
    751-756

    Deep learning is playing an increasingly important role in signal processing field due to its excellent performance on many inference problems. Parametric bilinear generalized approximate message passing (P-BiG-AMP) is a new approximate message passing based approach to a general class of structure-matrix bilinear estimation problems. In this letter, we propose a novel feed-forward neural network architecture to realize P-BiG-AMP methodology with deep learning for the inference problem of compressive sensing under matrix uncertainty. Linear transforms utilized in the recovery process and parameters involved in the input and output channels of measurement are jointly learned from training data. Simulation results show that the trained P-BiG-AMP network can achieve higher reconstruction performance than the P-BiG-AMP algorithm with parameters tuned via the expectation-maximization method.

  • Benchmarking Modern Edge Devices for AI Applications

    Pilsung KANG  Jongmin JO  

     
    PAPER-Computer System

      Pubricized:
    2020/12/08
      Vol:
    E104-D No:3
      Page(s):
    394-403

    AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.

61-80hit(149hit)